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Abstract 

Allopolyploidization in plants entails the merger of two divergent nuclear genomes, typically with only one set (usually 
maternal) of parental plastidial and mitochondrial genomes and with an altered cytonuclear stoichiometry. Thus, we 
might expect cytonuclear coevolution to be an important dimension of allopolyploid evolution. Here, we investigate 
cytonuclear coordination for the key chloroplast protein rubisco (ribulose 1,5-bisphosphate carboxylase/oxygenase), 
which is composed of nuclear-encoded, small subunits (SSUs) and plastid-encoded, large subunits. By studying gene 
composition and diversity as well as gene expression in four model allopolyploid lineages, Arabidopsis, Arachis, Brassica, 
and Nicotiana, we demonstrate that paralogous nuclear-encoded rbcS genes within diploids are subject to homogeniza- 
tion via gene conversion and that such concerted evolution via gene conversion characterizes duplicated genes (homo- 
eologs) at the polyploid level. Many gene conversions in the polyploids are intergenomic with respect to the diploid 
progenitor genomes, occur in functional domains of the homoeologous SSUs, and are directionally biased, such that the 
maternal amino acid states are favored. This consistent preferential maternal-to-paternal gene conversion is mirrored at 
the transcriptional level, with a uniform transcriptional bias of the maternal-like rbcS homoeologs. These data, repeated 
among multiple diverse angiosperm genera for an important photosynthetic enzyme, suggest that cytonuclear coevolu- 
tion may be mediated by intergenomic gene conversion and altered transcription of duplicated, now homoeologous 
nuclear genes. 
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Introduction 

Polyploidy is a prominent evolutionary process in plants, in 
which two or more parental genomes are combined into the 
same nucleus. Through multiplying a single genome or via 
combining divergent genomes, autopolyploids and allopoly- 
ploids are formed, respectively (Soltis and Soltis 2000; Wendel 
2000; Wendel and Doyle 2005). Although ancient polyploidy 
characterizes all flowering plant lineages (Jiao et al. 2011), 
recent allopolyploidy is observed in many plant lineages, in- 
cluding such well-known examples as Arabidopsis, Arachis 
(peanut), Brassica (cabbage), Nicotiana (tobacco), and 
Gossypium (cotton). In each of these genera, cytogenetic 
and molecular evidence have revealed extant diploid species 
that most closely resemble the diploid parents of the allo- 
polyploids (Koch et al. 2000; Inaba and Nishio 2002; Chase 
et al. 2003; Jakobsson et al. 2006; Seijo et al. 2007; Leitch et al. 
2008; Higgins et al. 2012; Bertioli et al. 2013). Comparative 
analyses of different allopolyploid species and their extant 
diploid relatives reveal that polyploidization results in com- 
plex and fascinating changes at different biological levels, in- 
cluding genomic alterations (loss of genes and nongenic 
elements and homoeologous genomic exchanges) (Lim 
et al. 2007; Salmon et al. 2010; Buggs et al. 2012), nonadditive 



gene expression including expression dominance and biased 
homoeolog expression (Hegarty et al. 2008; Rapp et al. 2009; 
Flagel and Wendel 2010; Graver et al. 2012; Buggs 2013; Yoo 
et al. 2013), and changes in epigenetic modifications (Wang 
et al. 2004; Madlung and Wendel 2013). 

In addition to these dynamic responses to polyploidization, 
there are potential stoichiometric disruptions caused by the 
combination of two nuclear genomes but inheritance of only 
one set of progenitor organellar genomes (usually maternal), 
suggesting a cytonuclear dimension to allopolyploid evolu- 
tion. Many aspects of cytonuclear coevolution have been 
considered for diploid plants and animals (Rand et al. 2004; 
Wolf 2009; Caruso et al. 2012; Burton et al. 2013), addressing a 
number of key topics such as the effects of cytonuclear inter- 
action on population fitness (Caruso et al. 2012; Burton et al. 
2013), the occurrence of compensatory coadaptative cyto- 
nuclear mutations (Rand et al. 2004), participation of cyto- 
nuclear coordination in hybrid breakdown (Burton et al. 
2013), and cytonuclear-epistasis-controlled nuclear genome 
imprinting (Wolf 2009). To date, though, the special circum- 
stances surrounding cytonuclear evolution in polyploids re- 
mains largely unexplored. Previously, we investigated how 
homoeologous nuclear genes of Gossypium allopolyploids 
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encoding subunits of one protein complex evolved in a new 
context where they need to interact with a subunit encoded 
by a gene from the plastome, inherited (in cotton) from only 
one of the two progenitor diploids (Gong et al. 2012). The 
model protein complex we utilized is Rubisco (Ribulose 1,5- 
bisphosphate carboxylase/oxygenase), an essential enzyme in 
carbon fixation during photosynthesis, which functions as 
octamer holoenzymes of small subunits (SSUs) encoded by 
a nuclear rbcS multigene family and large subunits (LSUs) 
encoded by a single plastid rbcL gene (Rodermel et al. 
1996). After characterizing rbcS and rbcL genie compositions 
in Gossypium, we explored their cytonuclear coordination at 
the genomic level, showing postpolyploidy, intergenomic, ma- 
ternal-to-paternal gene conversion between nuclear homo- 
eologs (Cong et al. 2012), in the direction opposite to that 
exhibited overall in Gossypium polyploids (Salmon et al. 2010; 
Flagel et al. 2012; Paterson et al. 2012; Guo et al. 2014). At the 
transcriptional level, biased maternal rbcS homoeolog expres- 
sion was also demonstrated. 

Intrigued by these findings for Gossypium, we asked 
whether similar cytonuclear coordination would be observed 
as a general phenomenon for rubisco evolution in other poly- 
ploids. Toward that end, we selected four exemplary angio- 
sperm polyploid lineages, Arabidopsis, Arachis, Brassica, and 
Nicotiana, each of which has a well-understood phylogeny 
with extant model diploids and stabilized descendant allo- 
polyploids. The rubisco rbcS and rbcL genes in each lineage 
were characterized. Within each lineage, phylogenies were 
constructed for rbcS gene paralogs and orthologs in the dip- 
loid species and placed in the context of their species diver- 
gence. By analyzing the rbcS gene sequences in representative 
parental diploids and allopolyploids, we demonstrate a con- 
sistent pattern of postpolyploidy gene conversion among 
rbcS homoeologs. In addition, biased homoeolog expression 
of paternal homoeologs carrying maternal conversions was 
also confirmed in most polyploid species. These results have 



general significance with respect to cytonuclear evolution in 
plant allopolyploids. 

Results 

Maternal Inheritance and Divergence among 
rbcL Genes 

rbcL genes from diploid and polyploid species of all four poly- 
ploid lineages were cloned and sequenced (table 1). Except in 
Brassica, there are from 0.43% to 0.65% nonsynonymous sub- 
stitutions between the LSU proteins of the parental diploid 
species in each lineage. As expected, each polyploid has the 
copy inherited from the maternal parents. In Brassica, no 
amino acid differences exist between the parental diploid 
species (table 1). Similar to observations for rbcL genes in 
diploid cottons (Cong et al. 2012), diverged amino acid res- 
idues cluster in the C-terminal alji- barrel domain and/or N- 
terminal domains of LSU subunits (table 1), which together 
form the active sites for rubisco (Spreitzer and Salvucci 2002). 
Notably, amino acid substitutions are also observed in the 
middle regions following the C-terminal domains, where the 
LSUs interact with the SSUs (Spreitzer and Salvucci 2002; 
Spreitzer et al. 2005). These raise the possibility of coevolu- 
tionary pressures in allopolyploids that might inherit diver- 
gent parental SSUs. 

rbcS Composition in Diploids 

Prior to cloning rbcS homoeologs in the polyploids, we cloned 
parental rbcS genes and aligned these into orthologs for in- 
ferences of homoeology in the polyploids. As shown in the 
exemplary rbcS sequence alignment for Arabidopsis (fig. 1), 
gene structure (introns/exons) was ascertained using cloned 
cDNAs. rbcS genes in most genera have three exons separated 
by two introns, the latter accumulating most of the substitu- 
tions and indels (fig. 1 and supplementary figs. S1-S3, 
Supplementary Material online). In Nicotiana, however, 



Table 1. Nonsynonymous Substitutions of rbcL Sequences in Species 3 of Four Polyploid Lineages. 



Amino Acid Position 
318 
458 
464 

Amino Acid Position 
2 
3 

260 

Amino acid position 



Amino acid position 
124 

422 



Arabidopsis 

^Arabidopsis thaliana (Columbia-0) A. suecica (Sue16) 

I I 
T T 
I I 



'^Arachis duranensis (PI 219823) 
M 
L 
G 



Arachis 

Arac. hypogaea (PI 161303) 
M 
L 



iB. rapa (PI649186) 



Brassica 

B. napus (PI633141) 
No nonsynonymous substitution 
Nicotiana 

^Nicotiana sylvestris (A403750326) N. tabacum (095-55) 

R R 
K K 



o A. arenosa (Strecno) 
V 
R 
V 

JArac. ipaensis (PI 468322) 
I 

S 
E 

cJB. oleracea (PI385959) 



JN. tomentosijormis (NIC 479/84) 
C 

Q 



a Accession listed beside each species names. Allopolyploids are shown in the central column, with maternal and paternal parents on the left and right, respectively. 
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ft. . thaliar.a- 1?, 
ft . 3U4C1CA - ASIA ! 
ft. tha liana -IB 
ft. BLLBcic^-AslEi : 
ft.thalianeL-SB 
ft . suBcica-KsSB : 
A. thaliar.a- 3E 
A, SU«ClCft-As3B : 
A.arenoEa-al 



G C T G GG 
G C T e GG 



ft.ar*ncisa-a2 
ft . suecica A&a2a : A 

ft, Huecica-ABa2b: |n 
ftn««nosti-A3 
ft . EUecica-AsaS 
Exaniplary _ cDHA 



* so * 100 * 120 * 140 * 

icagccacccgcaaggc i aacafl cdflcilttactt ccatcacjiagca ac^gcqgaagagttaact g c at g aggtcatttatattt 

icaggtcatttatattt 
iaaggt aatgtca 

aaatgtca 
aaatgtca 

-AATATCA 
-AATATCA 
-AATGTCA 

---TGTGA 

-AATrTTG 
■-AAIT/TTG 

[aaggtcatatatattt 
aa g 0 t ca ta tatattt 




A.thaliana-1A 
A.Buenina-AslA 
A. thai i ana- IB 
R suecica-AslB 
A.thaliana-2B 
A , 3U«ei ca-As2B 
A.thaliiui*-3B 
A . sueci c a - A s 3 c 
Ar&rtnesa-al 
A . suaci ci-Aul 
A . UBii[>sa-a2 
A . s je c i ca-Asa£a 
A . u-Au2b 
A . arenoss-aS 
a . sueei ea- ah3 
Hk turpi aty . cDNS 



160 * 180 * 200 * 220 * 240 * 

CTTCTTT CAC T X 1 T1VT ACT ATTCCATATGATTTTTTTCgETTC TTTCTTg GAA fC - TACAT AAA - C TAA TATCAT TSGAA -AAA 1 C GAAAAAATA GGT GT' 

CTTCTTTCACTTMTIATTATTCCMATGiATTTTT^ AAATCGAAAAAATAGGTini 

TTAATaAAAACTS TCTTTTtWCATGTGCAAT TAG T S CAA C TQAACAATAC - 1 TAA flA A fAATTCC* AAT Tee GAA T TJWAGGT GT< 

TTAATGAAAACTG TCTTTTGTCATGTCCftAT TAGTG CAA C TGAACAATAC - 1 TAA GA ATAATTGGGAAT TGG GAA 1 TATA GGT EH 

TTAATGAAAACTG TCTTTTGTCftTCTGCAATTAGTGCAACrGAACAATAC-TTTAGA - - - - - -ATAATTC6GAATTCGAATTTAI ' ,; ■ ' 

T T AA T GAAASCM TCTTTTGTCATGT G C ft ft T I A G T G CAAC T GAAC A AT AC - T TTAGA A TAATTGCfiAAT T C GAA 1 1 TATAGGTQTi 

TCAATSAAAA'CTG--------- TCTTTTSTCATG-TGC h ft T AAAA CAAAA CA TAGG AT ATfiTTATtTT - - - - - - TGGAATTGGATTTGGGGATTAT 

TCAATAAAAACTG TCTTTTGTC A TG T G C ft ft T A A ft A CAAAACATAGGATATATTAA T T TGGAATTGGATT TGG G GAT TATA RGTGTi 

TTAATGAAAGATG TCTTTTCTTftTGTfiCftATTAGTGGAACrGCACAAfiftC-TTAGAGTAATTTGGAA-TTGQATTTGGGAATTAT 

ITOTGMflflCCG TATTTTGTCAUM5TGC— CTTAGTGAAATTGAACAAAAC -TTCTAA TTGGATTTGGGAATTAT - 

TTAATGAAAACTG TC' TTTGTCATG™ GCAA T TA G T GAAA TTGAAGAAAAC -ATA C GA TAA T T TGGAATTGGGAATT G G GAA T TAT 

TTAATGAAAACTG TC-TTTGTrJATG— GCAATTAGTGAAArrGAACAAAAC-ATACGATAATTTGQAATTT'iiAATTGGGAATTAT 

TTAATGAAAACTG-- -TC-TTTGTCATG.-GCAATTAGTGAAATr[3AACAAAAC - ATACGATAATTTGGAATTTGGAATTGGGftftTTAT. 

CTTCTTCACTTTT - - - - TAATTTATTATCGGTTT T TAA T T C GA T T - TACATGAA - C TAA TA T TA T TCGAAAAATC GAT AAAAAAAA TAT 

CTTCTTCACTTTT TAATTTATTATCGGTT T T TAA T T C GA T T - TACATGAA - C TAA TA T TA T TCGAAAAATCGATAAAAAAAA TAT 



* 2B0 * 300 

X'ATTGGAAAGRAGAft GTT TGA SACTCTCT CTTACC TTC 



A T 

ICAATCQI 
AA C 
AA C ' 

:CAATCQC 
AA T 
AA' 'T • 
AA C 
AA G 
AA C 
AA C 
AA C 



T C C 

» C- AA G T T TO AQRC T C TAT 
fcGAAGTT TGAGACT C TAT C 
KGAAGTT TGAGACT C TA T C 
4GAAGTT TG AGAC T C TAT C ! 
\GAAGTTTGAGRCTCTATC 




A . 'thaliana-lfti 
A ■ suecica-ASlfc 
A. thalian«-lB 
A . suecica-AslB 
A . Chaliana-2B 
A . *U*iiCft-As£B 
A . thaliana-3B 
A . £uecica i 'As3& 
A.AX*nos*-Al 
A.EUecjLca-ftsal 
fl.arenasa-a2 
A . *U*4i«A-ASft2ft 
A . suBcica-ftsa2b 
A . arenosa-a? 
A . Au*clca-Axa3 
BKamplary , cDNft 



* 320 * 340 * 3(0 * 3S0 * 400 * 430 * 440 * 

CC TTCC ':'GaCTAAGGAAGTTCACTACeTT^TCCGCAAC^GTGGAT7CCTT<^CT^ — TAT - ATAAftCT AGCT AGATCTTAG- G3UIAATTTGGT 

Cn , GMCTTACCGATTCCGAATTGGGTAA<K^GTTGACTACCTT/ : TCCGCAACAAaTGQATTCCTTGTOTT — TAT - ATAAftCTAGCTAQATCTTAG- GftftftftTTT GGT 

T cGTCgaATT(^CTAAGGAAGTTGACTACC^CTCCGCAAO*A^TCGATTCC^GCGTTi^ ATTATTATTTTTTTC GTT T 

T C GT C GAAT T GGCT AAGGAJVGTTGA CTACCT TCT CCGCAACAA 7 -TGGATTCCTTGTGTTGAA TT. " G-'-AGGTAATATACACAAAAjCTTTTCCTTTG — TCTACTAATC ATTATTATTTTTTTCGTTT 

CTGACCTTA "i MA C ST j' GAAT T BSC T AAGGAA6TT GACTACCTTCT CCQCAACAABTGGATTCCTTQT(7TTGAATTC0ftQT T 3GA 3 3 TAA TA TACACTSAAC TTTTCCTTTS — TCT-CTAAflC ATCCTTT- ATTCTTC T T C T 

h TQA CGTT GAAT T GGCT AAGGAASTT GACTACCTT C T CCGCAACAA G^TGGAT^CCTTGTGTraAATTCGA QTT GGA G Q TAA TA TACACTGAAC T*TTTCCTTTG- - TCT -CTAAGC - - ■ ATCCTTT- ATTCTTC TTCT 

mi .■TGACGTgGAATTGGCTAAGGAAGTTCACTACCTTCTCOGCAAC^ ' "AGGTAATAAACRCAAAAGTTITCtTTTG- - TCTACTAATC ATTATTA - TTTATTC QTT T 

T CGTCGAATTGGC T AAGG A AGTT GACTACCTTCTCCG CAAGAAGTGGATTCCTTGTGTTGAATTC G AG T T '.AGGTAATAAACACAAAAGTTTTCTTTTG — TCTACTAATC ATTATTA- TTTftTTC GT T T 

IcrflcQTcflBhHi nm^i^i ^ G AGGTACTAAACATTGAACTTTTCCTCTA — TCT-TTAATC-' --ATCCTTT-ftTftftTTTCTTT 

G G AGGTACTAAACAOCeAACTTTTCCTTTG — TCT-CTAATC ■ ATCCTTT- ATTATCTCTTT 

n^H ^^HaeASSTACTAAACACTGAACTTTGCCTTTG — TCT-CTAATC ATCCTTT- ftTTGTTGT TAT 

G G AGGTACTAAACACCGAACTTTTCCTTTGTCTCT-CTAATC ATCCTTTATTftftTTftTTTT 

G & AGGTACTAA^ACACCGAACTTTTCCTTTGTCTCT-CTAATC ATCCTTTATTAATTATTTT 

|TTCC^^HG^H ^B-i^l G G AGGTAATTAAG A A ACT AGCT AG ATCTTAG- GftftftATTCGGT 

TTCC _ AAW^H ilHc^l G S AGGTAfirrAAG AAACT AGCTASATCTTftS- iSAftfiftT T C GG T 




460 * 460 * 500 * 520 * 540 * 560 * M0 * 600 



A . thai i ana- 1 A- : t TAA TA TA T TA G G - A TC TTGATT T ATATAA ACATGTTCAAftAT-" -GTTATCTGAGTGGTTTGTASCATGTGGTTTGTAT 

A . SUtei Cb-ASlA IT T AATATA T TA a a - ATCTTfiATTT ATATAA A«iTGTTCftftftftT aTTATCTSAaTSeTTTGTAACATGTaaTTTQTAT 

A l thaliana-lB : T CA T TGCTAGAAAT ATTA G C C TA TAACCGGATTTTT T AGAA C C GAAA TA GACTAT ATCACCTTCTGCATATCCTCCAAT ft T Gfi A T T G TA T T GAA rGGTTTTC TTA T GT G TTTAT 

A . sueeiea-Asia : r CA T TGCTAG A A AT AT TA G C C TA TAACCOGATTT T T TA GAA C Q GAAA TAGACTAT ATC ACCTTGTGCATATCC TC C ft ft T A T C A A T T G TA T T Gfl A T G- G T T T 'T C T TA TGTG---T TTAT 


AGCACGGATTTGTO 
SATTTflTa 
1 ; ;-TTTGTG 
AGCACG6ATTTGTG 


TACCGTOAOCACGGTAAC 
TACCGTGAacACGGTAAC 
TACCGTGASCACGGAAAC 
TACCGTGASCACCGAAAC 


A 


A . thai i ana- 2B : TCATTGCTAGAAACTTAAGTGGAT AA C C GAAA TAGACTAT CACCA — ATATftTftT ftT CAA T T G TA T TGAATQGTTTTC TT TA TATAT 

A.BUBcica-Ai2B : TCArTGCTAGAAACTTAAGTGGAT AACC GAAA TAGACTAT CACCAATATATAT AT AT CAA T T G TA T T GJWFSGTTTTC TT TA TATAT 


J'.TTTGTG 
■ TTTGTG 


TACCGTGAQCACGGAAA C 
T ACCGTGAGCAC GGAAA C 


A 
h 


A . thaliana- 3B : T CA T TGCTAG A A AT AT TA G C C TA TAACCGSATTTTGTA GAA C C GAAA TAGACTAT ATC ACCTTGTGCATATCC TC C ft ft T ATC AA T T G TA T T G AA T G- T T T 'T C T TAT G T G TTTAT 

A.ftU»eiea.-A*33 ; T CA T TGCTAGAAATAT TA a C C TA TAACOSaATTTOflTAGAA C C GAAA TAGACTATATCACCTTflTGICATRTCCTCCftftTtftTCftA T T a TA T TSAA'TSSTmiC TTAT GT a TTTAT 


AGCACGGATTTGTG 
lATTTQTa 


TACCGTGASCftCGQAAAC 
TACCGTGASCAC GGftAA C 


A 
A 


A . arOHlOB a-al : TCAT TBGTAGA AAC f, TAA G T CAA TAA CCGGATTraT -TAAAACC GAAA TAGACTAT CACCGTGTGAGTCCTC GAAT AT CAA T G C TA T T □ TATGET^TTT GT Cfl T G T G - -TTTTAT 

A suecica-Agal : TCA rCAGTAGAAAC ATAA G T CAA TAACCG6ATTT T - T AAAA C C GAAA TAGACTAT -C ACCGTGTGATCACTCTTCG AC T ATC AA T G G TA T T GAATSGTTTTG T C A T G T G TTTAT 

A.ari«Ii03a-a2 : T T - T T £ C T AG ft ft ft C A TAA G T CA G TAACCGGATTTT - TAAAA C C GAAA TAAAC T AT AT ATT ACCGGGTGAATTC TC C AftT ft T C ftft T T G TA T TSAATOGTTCTC TTAC T T G TTTAT 

A . 3uecica-Asa2a : tcattgstagaaatataagtciaataaccggatttt-aaabacccaaaabj^ctatatattaccgt^^ 

A.siiecica-Aaa2b: TCArTSGTAGAAATftTAAGTCAATAAOCe^TTTT-AAAAACCCAA TTTAT 

A.ar»ni>Sft-a3 : TA GCCATTCOCATCATATA TAA G GAATCTTGATTTATATAA A CA TA TACft Aft AT GTTTTTCTGAQTGGTTGTAATATATGTGGTTTGT 


G 






A 














A . SUBe:d.ca-A&a3 : rAGCDArrGGGATGATATATAAGGAATCrrGATTTATATAA A CA TA T A CAA A AT GTTTTTCrGAGrGGTTGTAATfllATGTGGTTTGT 















A. thaliana-lA 

A. auecica-AslA 
A. thaliana-lB 
A.iu*eiea-A*lB 
A . thai l a n a - 2 fc 

A . suecica- As.2E 

A.thaliana-SB 
A.Hiiacicji-As3B 
A. aceaasa-al 
A.&U*CiCB-ASAl 



: c:\lcCggaTactatg: 
: t c ^tactacsj 



r c 
T C 



TACTGGACAflTGTGGAAGCT^CCCTTGTTCGGTTGCACCGACTCCGCTCAAGTGTTGAAGGi 
TAC TGGA CAATGT GGAA GCTT C C C TTGTTCGGTTGCACCGACTCCGCTCA AGTOTTG A A □ Gj 



ACTGGACAATGTGGAAGi 



iCcqacKoacmuuncffvuMuii 

iCCOACTCCQCTCAA ; I 
iCCGACTCCGC TC AAGT GTTG A A G I 
iCCGACTCCaCTCAAGTGTTGAA □ I 



;fn. 



T4& » 

'cbgattcgacaac* : ;c 

CGGATTCGACAACJ 2 :■- 

'cqsattcgal a ac? : x 
'cGGArrcGfiCAACi ; :c 



f 2X 




A. su*cica-A3a2b 



.-Asa 3 
EKainpl a try , cDHA 
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7KD 

C i 
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fGCATCAaTTTCA 1 
ATC 




Fig. 1. Alignment of Arabidopsis rbcS orthologs and homoeologs with featured SNPs and gene conversion events highlighted in the exons. An 
exemplary cloned cDNA at the bottom (in light blue) is aligned with genomic rbcS homologs to ascertain rbcS exons/introns structure. Only featured 
SNPs and gene conversions in exonic regions are illustrated here. Conserved nucleotides in all orthologs and homoeologs are shown in gray. Homologs 
of maternal and paternal origins are highlighted in orange and green, respectively. Species-specific SNP positions (748 and 776) are marked by yellow 
ovals above the alignment blocks. Multiple genome-unique SNPs in diploid parental copies are shown in orange (maternal) and green (paternal) text. 
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there are three introns separating the coding region into four 
exons (supplementary fig. S3, Supplementary Material 
online). In exons of rbcS paralogs in each parental diploid, 
there are species-specific (consistent polymorphic substitu- 
tion shared by all paralogs in the same species) and genome- 
unique (existing in a unique genome) single-nucleotide 
polymorphisms (SNPs), denoted in the exons of the align- 
ment (fig. 1, supplementary figs. S1-S3, Supplementary 
Material online). Two groups of genome-unique SNPs are 
further recognized: Category I includes genome-unique 
SNPs present in at least two paralogs of a specific species; 
category II SNPs are carried by only one rbcS paralog (fig. 1, 
supplementary figs. S1-S3, Supplementary Material online; 
table 2). Species-specific SNPs shared by all paralogs of the 
same species were detected only in Arabidopsis, where the 
two species-specific SNPs have "C (Cytosine)" and "A 
(Adenine)" at the 748th position and "T (Thymine)" and "C 
(Cytosine)" at the 776th position in Arabidopsis thaliana and 
A. arenosa, respectively (fig. 1, table 2). 

To compare the fixation rates of exonic, genome-unique 
SNPs, we tabulated their numbers in diploids of each lineage 
and included data generated previously for Gossypium species 
(supplementary fig. S4, Supplementary Material online, table 
2). Because genome-unique SNPs in category I exist in multi- 
ple paralogs of the same diploid species, these SNPs are trea- 
ted as nucleotide mutations that are fixed and spread by local 
gene conversions. As shown, the proportions of fixed 
genome-unique SNPs in category I are variable among 
lineages, ranging from 1.04% in Arachis to 4.47% in Brassica 
(table 2). This divergence is related to organismal divergence 
time (supplementary fig. S4, Supplementary Material online, 
table 2), with the notable exception of Brassica. For this genus, 
in which the progenitor diploids are thought to have diverged 
approximately 3.5 Ma (Higgins et al. 2012), a much higher 
proportion of genome-unique SNPs (4.47%) in category I is 
observed. This is significantly higher than in similarly aged 
Arachis (diverged 3.5 Ma, Seijo et al. 2007), older 
Arabidopsis (diverged 5 Ma, Jakobsson et al. 2006) and 
Gossypium (Wendel et al. 2010), or even the more ancient 
Nicotiana lineage (diverged 15 Ma, Leitch et al. 2008) (sup- 
plementary fig. S4, Supplementary Material online, and 
table 2). Possible explanations for this exceptional divergence 
in Brassica are discussed below. Accordingly, Brassica was not 
included in the correlation calculation but still is shown in the 
regression plot (supplementary fig. S4, Supplementary 
Material online). Apart from Brassica, a significant correla- 
tion was observed in fixation rate of exonic category I 
genome-unique SNPs (R 2 = 0.53585, P value < 0.05) (supple- 
mentary fig. S4, Supplementary Material online). 



To understand the evolutionary history of the diploid rbcS 
orthologs, phylogenetic trees were constructed in the context 
of diploid species divergence within each lineage (fig. 2). In all 
cases, gene copy numbers are based on published genome 
sequences in conjunction with the cloning and sequence 
data. Unusually divergent rbcS paralogs are shown in blue, 
which includes orthologous groups 1 A and a3 in Arabidopsis, 
Al and B1 in Arachis, Al and C1-C3 in Brassica, and S5 and 
T5 in Nicotiana. Because gene conversion at the diploid level 
has homogenized sequence pairs in many cases, the number 
of different gene copies is lower than the number of actual 
gene copies. In figure 2, homogenized copies are shown by 
interacting double helices. Among the species studied, the 
number of rbcS orthologs ranges from 4 to 12 (fig. 2). In 
some cases, autapomorphic substitutions arose following 
polyploidy, confirming the presence of gene converted and 
hence homogenized duplicates at the diploid level. There was 
no loss of any homoeolog in any of the four allopolyploids 
studied. 

Gene Conversion Events Following Allopolyploidy 
Comparison of each rbcS homoeolog with their parental 
orthologous copies revealed a number of autapomorphic nu- 
cleotide substitutions that have accumulated after formation 
of each polyploid (table 2). At the low end, in Nicotiana 
tabacum, 11 autapomorphic SNPs were detected, represent- 
ing 2.40 % of the exonic nucleotide positions. The higher levels 
were for A. suecica and Gossypium hirsutum with the propor- 
tions 7.55% and 8.01% (table 2). As shown in supplementary 
figure S5, Supplementary Material online, the level of autapo- 
morphic SNP presence is dependent on polyploid age; more 
recent polyploids have fewer SNPs. For example, in Arachis 
hypogaea and Brassica napus, polyploids of similar age 
( > 5,000 and < 10,000 years ago), almost equivalent propor- 
tions of autapomorphic SNPs are detected (3.73% and 3.35%, 
respectively). Nicotiana tabacum, a polyploid estimated as less 
than 200,000 years old, has an exceptionally small proportion 
of SNPs, whereas for A. suecica and C. hirsutum, the ancient 
polyploid species in our analysis (formed 12,000-300,000 
years ago and 1-2 Ma, respectively) has the higher propor- 
tions of exonic autapomorphic SNPs (supplementary fig. S5, 
Supplementary Material online). 

We inferred the parental origin of each homoeolog in the 
polyploids through comparisons with their diploid orthologs. 
We then inspected each homoeolog for genome-diagnostic 
SNPs from a different rbcS gene, mindful of the possibility 
(Gong et al. 2012) of intergenomic gene conversions. 
Alternatively, intragenomic gene conversions are implicated 
when they exclusively involve diagnostic SNPs among 



Fig. 1. Continued 

Autapomorphic substitutions in polyploid homoeologs are shown in pink. Inferred intra- and intersubgenomic gene conversion events are in blue and 
red boxes, respectively. For the positions involved in intersubgenomic gene conversion, the parental origin of each intersubgenomic converted 
nucleotide is illustrated by color (maternal origin: orange; paternal origin: green). Polyploid homoeologs with mosaic filled color boxes are the 
copies having intergenomic conversions. At the bottom of each alignment block, numbered gene conversion events resulting in synonymous/ 
nonsynonymous substitutions are marked in blue and purple diamonds, respectively. 
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homoeologous copies of the same parental origin. A summary 
of these inferences of the intra- and intergenomic gene con- 
versions is illustrated for each lineage (figs. 1 and 3, supple- 
mentary figs. S1-S3 and S6-S8, Supplementary Material 
online). Together with previous findings for Gossypium poly- 
ploids, we note several features of the inter/intragenomic 
gene conversions: 1) most conversion events were interge- 
nomic (figs. 1 and 4 and supplementary figs. SI -S3, 
Supplementary Material online). Specifically, except for 
three intragenomic conversions in A. suecica (1st, 2nd, and 
4th events among nine gene conversion events; fig. 1), there 
were no intragenomic conversions detected in other studied 



polyploids, including Gossypium (Cong et al. 2012); 2) similar 
to the short rbcS genes in Gossypium (Gong et al. 2012), 
intergenomic events altered the originally identical rbcS du- 
plicates (those linked by anastomosing lines in fig. 2) so they 
became distinguishable (different) at the polyploid level — for 
example, two identical paralogs in A. arenosa became two 
different homoeologs, A. suecica — Asa2a and A. suecica — 
Asa2b, when the latter copy obtained maternal diagnostic 
SNPs via 5th-9th intergenomic conversion events (fig. 1); 
and 3) most of the intergenomic events occurred in the pa- 
ternal homoeologs, using templates from the maternal 
homoeologs (figs. 1, 4, and supplementary fig. S1-S3, 



Table 2. Summary of Exonic Genome-Unique, Species-Specific, and Autapomorphic SNPs in Species of Five Polyploid 
Lineages. 3 



Lineage 


Diploids 




Polyploids 




Genome-Unique SNPs 


Species-Specific SNPs 


Autapomorphic SNPs 


Arabidopsis 


35 (7.14%) = 11 (2.24%) + 24 (4.90%) 


748th and 776th 


37 (7.55%) 


Arachis 


16 (3.32%) = 5 (1.04%) + 11 (2.28%) 


None 


18 (3.73%) 


Brassica 


54 (10.06%) = 24 (4.47%) + 30 (5.59%) 


None 


18 (3.35%) 


Nicotiana 


54 (11.79%) = 14 (3.06%) + 40 (8.73%) 


None 


11 (2.40%) 


Gossypium 


=26 (4.73%) = 24 (4.37%) + 2 (0.36%) 


546th and 629th 


44 (8.01%) 



'Shown are the numbers and proportions of each SNP category across all sequenced exonic nucleotide positions. 
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Fig. 2. Evolutionary history of rbcS genes in diploid species in four genera. Gene names in maternal and paternal diploid species are denoted in orange 
and green, respectively. Unusually divergent rbcS paralogs are shown in blue, which includes orthologous groups 1A and a3 in Arabidopsis, A1 and B1 in 
Arachis, A1 and C1-C3 in Brassica, and S5 and T5 in Nicotiana. Because gene conversion at the diploid level has homogenized sequence pairs in many 
cases, the number of different gene copies is lower than the number of actual gene copies; homogenized copies are shown by anastomosing double 
helices. 
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Fig. 3. Alignment of SSU proteins encoded by rbcS orthologs and homoeologs in Arabidopsis lineage. Maternal and paternal origin of each rbcS 
homolog is highlighted in orange and green color, respectively. Conserved amino acids are shown in gray, whereas polymorphic amino acid substitutions 
are in black. The synonymous/nonsynonymous substitutions caused by gene conversions are marked using different diamonds as in figure 1. Essential 
interface regions in SSUs, the predicted /SA//3B loops where SSUs contacts with LSUs, are shown by open gray boxes. 
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Fig. 4. Summary of gene conversions in multiple SSU domains. Gene 
conversion events among homoeologs from the same and different 
genomic origins, defined as intra- and intergenomic conversion 
events, are shown in the right and left panels, respectively. Within 
each functional SSU domain (on thex axis), the total numbers of con- 
version events introducing synonymous and nonsynonymous amino 
acid substitutions are denoted by green and blue bars, respectively. 
The pink and red frames around each green and blue bar highlight 
conversion directions, paternal to maternal (paternal state introduced 
into maternal homoeologs) and maternal-to-paternal (maternal state 
introduced into paternal homoeologs), respectively. 



Supplementary Material online) — in other words, gene con- 
versions occurred preferentially in the direction of introduc- 
ing maternal-diagnostic SNPs into paternal homoeologs 
(simplified as "maternal-to-paternal" conversions). This is 
also the case in Gossypium polyploid species (Gong et al. 
2012). Here, this is exemplified in A. suecica, where five of 
six intergenomic conversions entailed maternal-diagnostic 
SNPs detected in paternal homoeologs (fig. 1). 

Protein sequences of all rbcS orthologs and homoeologs 
were predicted. Within the protein alignment, the aforemen- 
tioned gene conversions were discovered to generate nonsy- 
nonymous amino acid substitutions only in A. suecica and 
Arac. hypogaea; most gene conversions did not result in 
amino acid changes (figs. 3, 4, and supplementary figs. 
S6-S8, Supplementary Material online). In A. suecica, the 
7th and 8th conversion events brought maternal-specific 
"C (Glycine)" and "T (Threonine)" residues into the paternal 
homoeolog "A. suec/ca-Aa2b," in the process replacing the 
paternal amino acids "N (Asparagine)" at those two positions 
(fig. 3). Similarly, in Arac. hypogaea, the first conversion event 
caused nonsynonymous amino acid substitution in "Arac. 
hypogaea- AhB3b" homoeolog (supplementary fig. S6, 
Supplementary Material online). 
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Table 3. Comparisons of Homoeolog Expression in Five Polyploids. 



Species 


Homoeolog Pairs 


Expression 


Z Value = Difference/ 


Significance 




in rnmnaiicnn' 

in \_ompanson 


Differences 


^vanance^ 


(one side) 


Arobidopsis SuccicQ 


Asa2a vs. Asa2b 


—327 


— 1 7.1 66 


P < 0.001 


Arachis hypogaea 


AhB3a vs. AhB3b 


-2,908 c 


-9.09 


P < 0.001 


Brassica napus 


BnC6a vs. BnC6b 


2,202 


47.057 


P < 0.001 


Nicotiana tabacum 


NtT3a vs. NtT3b 


-4,559 


-69.69 


P < 0.001 




NtT4a vs. NtT4b 


-4,166 


-33.49 


P < 0.001 


Cossypium hirsutum 


GhD-shortl vs. ChD-short2 


-1,870 c 


-30.45 


P < 0.001 



a The homoeolog without maternal-to-paternal conversions is listed first. 

b Negative expression differences are interpreted as biased expression of homoeolog copies with maternal-to-paternal gene conversions relative 
to the homoeologs without such conversions. 

c Those two RNA sequencing experiment involved three biological replicates generated from mature leaves (Peggy Ozias-Akins, unpublished 
data and SRA056385 in Yoo et al. 2013). Expression difference shown is from one replicate of each experiment. Significant expression 
differences are consistently identified at the same P value level for all other replicates (not shown). 



We summarized the distribution of types of gene conver- 
sion across the different SSU functional domains (fig. 4). SSU 
proteins were partitioned into four domains: Transit peptide 
(signaling peptide for pre-SSU targeting plastid and transpor- 
tation into plastid); transit-loop interval region (mainly com- 
posed by a-helix A between signal peptide and /3A//6B loop); 
/3A//3B loop region (interface of SSU with LSU, which includes 
the /0-strands and their enclosed loop); and all other /3 strands 
at the C-terminal end (Spreitzer and Salvucci 2002; Genkov 
and Spreitzer 2009; Kim et al. 2010). No gene conversion was 
detected in the transit-loop interval in any polyploid. 
Consequently, this region was excluded from the summary 
bar chart (fig. 4). In addition, the major intergenomic conver- 
sions preferentially occurred in the transit peptides and the 
C-terminal /3 strands rather than in the /6A//3B loop region 
where SSUs interact with LSUs in the rubisco holoenzyme. 
Finally, in terms of the intergenomic conversion directions, 
the preferred "maternal-to-paternal" conversion events were 
detected in each SSU domain. All three nonsynonymous, in- 
tergenomic conversions introduced maternal amino acids 
into the paternal homoeologous SSUs (fig. 4). 

Biased Expression of Paternal rbcS Homoeologs with 
Maternal-Converted Regions 

To address whether there is biased homoeolog expression- 
related genomic origin of rbcS genes and if this is correlated 
with intergenomic gene conversions, we compared transcript 
levels for all polyploids (table 3). Homoeolog expressions were 
determined by multiplying the read coverage proportion of 
their specific SNPs by the total mapped rbcS reads (table 3 
and supplementary table S4, Supplementary Material online). 
Within all polyploid species except B. napus, the paternal 
homoeologs with converted maternal segments were 
always significantly more highly expressed than their homo- 
eologous counterparts without such intergenomic conver- 
sions (table 3). In contrast, in B. napus, the paternal 
homoeolog without gene conversion (BnC6a) had signifi- 
cantly higher expression than its counterpart paternal 
homoeolog (BnC6b) with maternal-to-paternal conversions 
(table 3). 



Discussion 

Here, we extend our results on cytonuclear coevolution of 
rubisco genes in Cossypium allopolyploids (Gong et al. 2012) 
to four other model allopolyploids, Arabidopsis, Arachis, 
Brassica, and Nicotiana. Our goal was to explore the extent 
to which the genie and transcriptional biases observed in 
cotton are mirrored in other allopolyploids and thereby 
gain insight into the generality of our indications of cytonuc- 
lear coevolution. Specifically, our aims were to discern the 
genie copy numbers and structures of nuclear rbcS genes in 
different genera, their propensity for "gene conversion" at 
both the diploid and allopolyploid levels, and the possible 
interplay between these dynamics and those of the plastid- 
encoded rbcL gene. We further wished to assess whether 
there is biased expression of homoeologs in other genera, 
how this relates to gene conversion, and the degree of simi- 
larities among multiple, phylogenetically dispersed angio- 
sperm allopolyploids. 

Potential Selection Pressure for Cytonuclear 
Coordination among rbcS Genes in Polyploids 

rbcL is widely utilized as a slowly evolving plastid gene for 
purposes of phylogenetic reconstruction of angiosperm fam- 
ilies and orders. Accordingly, we expected little sequence evo- 
lution among con-generic species and such is indeed the case 
for the data presented here (table 1 ). Yet several nonsynon- 
ymous differences are observed between rbcL genes from 
different diploid parents (except in Brassica), documenting 
maternal inheritance of the plastome in the allopolyploids, 
and indicating possible functional regions of LSUs that could 
conceivably apply selective pressure for optimization of bipa- 
rentally inherited rbcS-derived SSU proteins. Specifically, 
during diploid divergence, the LSU in three of the four 
genera studied here accumulated several amino acid substi- 
tutions at both the C and N termini. Considering the C- and 
N- terminal domains are the catalytic centers and where the 
subunit interfaces with SSUs (Spreitzer et al. 2005; Genkov 
and Spreitzer 2009), the possibility exists that selection has 
operated on rbcS genes in the allopolyploid to optimize 
rubisco holoenzyme activity. As discussed below, the rbcS 
data are suggestive of this mechanism of compensation, for 
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most genera studied. Notably, Brassica is exceptional, with no 
amino acid divergence between parental LSUs, yet it too ex- 
hibits signatures of cytonuclear coevolution (see below), 
thereby implicating selection operating on other aspects of 
cytonuclear regulation. 

Concerted Evolution of rbcS Genes in Diploid Species 
In angiosperms studied to date, similarities of rbcS genes 
within species are often observed (Gong et al. 2012), especially 
among tandom rbcS paralogs, with lower similarities among 
physically dispersed rbcS paralogs. These observations, com- 
bined with phylogenetic evidence showing even lower simi- 
larities of rbcS orthologs in different species, have been taken 
as evidence that rbcS genes frequently are subjected to "con- 
certed evolution" or sequence homogenization via gene con- 
version (Meagher et al. 1989; Clegg et al. 1997). 

Concerted evolution is also evident in most species studied 
here (fig. 2). These inferences are based on two sources of 
information, that is, cloning and sequencing data, which pro- 
vide diagnostic SNPs for rbcS paralogs, and genome sequence 
data, which provides gene number counts. The former in- 
cludes both species-specific and genome-unique SNPs of 
the same genus. Species-specific SNPs reflect homogenization 
among paralogs within species, presumably from a gene con- 
version process that is evolutionarily sporadic. Interestingly, 
this process appears to be insufficiently frequent to comple- 
tely homogenize paralogs but sufficiently common that its 
footprints are visible in the current suite of rbcS genes in each 
species. Similar results were previously reported for 
Gossypium (Gong et al. 2012). Genome-unique SNPs in 
each species, as described previously, can be further sorted 
into two categories, which have experienced distinct evolu- 
tionary histories. Category I includes genome-unique SNPs 
present in at least two paralogs of a specific species 
(table 2), which likely are derived from relatively recent ho- 
mogenization via local/minor conversions among several 
rather than all paralogs. Category II includes most genome- 
unique SNPs (table 2), existing in single rbcS paralogs. We infer 
that these SNPs are the most recent substitutions generated 
in specific rbcS paralogs, such that they have not been ho- 
mogenized across any other paralog. Possible mechanistic 
hypotheses for this failure to homogenize include recency 
of these SNPs relative to the pace of gene conversion, and/ 
or spatial dispersal of these paralogs from other paralogs, so 
that the opportunities for gene conversion are lower. 

Another interesting dimension of our data is the relatively 
consistent fixation rate of genome-unique SNPs in different 
lineages. Given a significant positive linear correlation of the 
proportion of category I genome-unique SNPs with diver- 
gence time in most genera (supplementary fig. S4, 
Supplementary Material online), the balance between nucle- 
otide mutations in rbcS genes and their erasure via homog- 
enization may generally be similar among plant lineages. This 
suggestion clearly will benefit from additional study using 
other plant genera. It may be, for example, that life history 
features such as mating system, population level dynamics, 
and effective population size create variation in this mutation 



fixation balance. The higher fixation rates observed in the 
obligately outcrossing Brassica, for example, might reflect 
these factors (Wright et al. 2008; Ivanov and Gaude 2009). 

One somewhat ironic observation is that in some cases, 
more rbcS genes are detectable at the allopolyploid than the 
diploid level. This reflects both the absence of gene loss fol- 
lowing allopolyploidy and the evolution of novel SNPs 
postpolyploidy, which render previously identical paralogs 
(at the diploid level) nonidentical. For instance, in 
Arabidopsis, the similar but different A.suec/ca-Asa2a and 
A.suedca-Asa2b (corresponding to two identical A. arenosa- 
a2 paralogous copies) and in Gossypium (Gong et al. 2012), 
one more short-type rbcS homoeolog, are examples where 
different genes are observed at the polyploid level, caused by 
mutation being ahead of homogenization. A second example 
involves multiple distinct paralogs in one diploid species and a 
single group of identical paralogs in another diploid species, 
such as in Brassica, where there is orthology between B. oler- 
acea-C8a and B. oleracea-C8b and two identical B. rapa- Ad 
genes, and between B. rapa-A2a and B. rapa-A2b and two 
identical B. oleracea-C6 genes (fig. 2). 

The more extreme cases of escape from homogenization 
involve the near-independent rbcS copies in each diploid spe- 
cies of each lineage studied (blue lines in fig. 2). As proposed 
for Gossypium, this relative independence may be related to 
their distinct chromosomal locations (Gong et al. 2012). For 
example, in A. thaliana, three paralogs (IB, 2B, and 3B), with 
relative higher sequence similarities, are all located on chro- 
mosome 5, whereas the 1 A paralog with the least similarity is 
on chromosome 1. In Gossypium, relatively independent long 
and short paralog groups are also clustered on chromosomes 
11 and 1, respectively. In Brassica, three identical B. rapa-A'\ 
copies and its three B. oleracea-C orthologs (-C1 to -C3) have 
the lowest sequence similarity to the other paralogs in each 
diploid species (fig. 2). A parsimonious explanation for this 
observation is that after the originally clustered gene copies 
translocated to new genomic regions in the common ances- 
tor of B. rapa and B. oleracea, the three rbcS paralogs in B. 
rapa began to evolve independently from other rbcS paralogs, 
while still being subject to local gene conversion homogeni- 
zation pressures; the other three gene copies in B. oleracea, 
however, came to be distinguishable via novel mutations. In 
brief, physical dispersal could protect independent copies 
from global homogenization. 

Concerted Evolution of rbcS Homoeologs in 
Allopolyploids 

Because allopolyploidy entails the merger of two sets of rbcS 
genes, gene conversion can, in principle, homogenize not only 
paralogs but also homoeologs. Notably, there are many auta- 
pomorphic SNPs in the allopolyploids, some identified in ge- 
nomic conversion regions (shown as pink SNPs in each 
alignment file). Thus, these autapomorphic SNPs are new 
SNPs introduced by homogenization via gene conversion 
across homoeologs. These mutations appear to be related 
to the time since polyploidization, as the relatively older A. 
suecica and C. hirsutum have more of these SNPs than are 
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observed in the relatively younger Arac. hypogaea and 
B. napus sequences (supplementary fig. S5, Supplementary 
Material online). 

Genomic Cytonuclear Coordination of rbcS in 
Allopolyploids 

Intergenomic coadaptation or coordination between the nu- 
clear and cytoplasmic organellar genomes is an essential com- 
ponent of evolutionarily successful hybridization events 
(Burton et al. 2013). Intergenomic interactions may be inter- 
rupted when hybridization occurs between genetically diver- 
gent populations, which combine divergent nuclear genomes 
with only a single set of cytoplasmic genomes. With respect to 
the rubisco complex, diverged nuclear rbcS homoeologs in- 
herited from both parental species may be posited to be 
targets of selection following genome merger and doubling 
at the time of polyploid formation, in response to their new 
cellular milieu containing only the maternal cytoplasm. 

As shown in Gossypium, one path toward reducing poten- 
tial cytonuclear conflict is "maternal-to-paternal," intergeno- 
mic homogenization of rbcS homoeologs, presumably to 
stabilize or optimize rubisco holoenzyme activity. 
Specifically, in the N-terminal transit peptide region, which 
possesses the necessary information for SSU targeting and 
transport into the chloroplast (Bruce 2000; Lee et al. 2002), 
the potential relief from inefficient recognition and transport 
of paternal SSUs into the maternal chloroplast could conceiv- 
ably be achieved by intergenomic, nonsynonymous gene con- 
versions of paternally inherited rbcS copies. This possibility is 
exemplified by the 1st conversion event in Arac. hypogaea- 
AhB3b (supplementary figs. S1 and S7, Supplementary 
Material online). Similarly, at the C-terminal ,6-strands 
domain, which maintains holoenzyme structural stability 
and also potentially regulates LSU/SSU interactions 
(Esquivel et al. 2002; Spreitzer and Salvucci 2002), paternal 
SSU homoeologs obtained maternal-like, C-terminal fi- 
strands via intergenomic, nonsynonymous conversions both 
in Gossypium and the currently studied genera (fig. 4). This 
group of converted, paternal SSUs could also be favored 
during or after the assembly process with the maternal 
LSUs in the holoenzyme. However, in the /6A//3B loop 
region where SSU proteins contact LSUs (Spreitzer et al. 
2005; Genkov and Spreitzer 2009), there were no amino 
changes in the currently studied allopolyploids that were in- 
troduced by intergenomic conversions, so all paternal SSUs 
maintained their original protein sequences. Two scenarios 
can explain this observation: 1) paternal SSUs have sufficient 
compatibility with the cytoplasmic LSU at this interface 
region, such that fitness is not compromised and 2) insuffi- 
cient time has elapsed for "more fit" genomic conversions to 
arise. In C. hirsutum, the loop regions of all divergent paternal 
SSUs have been replaced by the maternal loops via nonsynon- 
ymous, maternal-to-paternal gene conversions (Gong et al. 
2012), suggesting an evolutionary future for these "caught in 
the act" younger allopolyploids. Targeting mutation experi- 
ments with artificial maternal-to-paternal conversions in the 



/6A//3B loop regions of paternal rbcS homoeologs would be 
interesting experiments to evaluate these scenarios. 

The evidence presented here is consistent with, but does 
not prove, preferential selection for the products of interge- 
nomic, maternal-to-paternal gene conversions (among intra, 
maternal-to-paternal or paternal-to-maternal events) across 
paternal nuclear homoeologs, followed by homogenization of 
the selected conversions across other copies originating from 
the paternal genome. Specially, for the polyploid species, in 
addition to the maternal-to-paternal intergenomic conver- 
sions, both intragenomic conversions and paternal-to-mater- 
nal, intergenomic conversions have probably occurred in 
paternal rbcS homoeologs following polyploidization, de- 
tected across different SSU domains (fig. 4). Cytonuclear co- 
evolutionary pressure may thus have preferentially selected 
intergenomic, maternal-to-paternal conversions. Given the 
relatively recent formation (<0.5 Ma) of all allopolyploids 
analyzed here, intragenomic and paternal-to-maternal con- 
versions remain evident, perhaps having had insufficient time 
to homogenize the putatively beneficial maternal-to-paternal 
conversions across all rbcS copies, some of which retain their 
original parental diagnostic SNPs. In Gossypium, where poly- 
ploidy originated 1-2 Ma, the maternal, genome-specific 
SNPs have been homogenized across all paternal homoeologs. 
Additional evidence from other genera will further inform this 
possible evolutionary scenario. 

A special case exists in B. napus, which inherited the ma- 
ternal diploid LSUs with no amino acid divergence from the 
paternal LSUs. Given this observation, and the assumption 
that this would eliminate the possibility of selection at the 
level of SSU/LSU interaction, one might expect random in- 
terchanges among homoeologs irrespective of parental origin. 
Yet even in Brassica only intergenomic, maternal-to-paternal 
conversions were detected (in the paternal homoeolog, 
BnC6b). Relevant to this observation is the fact that SSU 
proteins need to be recognized by multiple cytoplasmic fac- 
tors and transported to the surface membrane of the mater- 
nally derived plastid, where they are subjected to 
transmembrane transport into the plastids. It is possible 
that the gene conversion observed here reflects selection at 
this level, during some stage or process involved with mater- 
nal trans-membrane transport (Bruce 2000; Lee et al. 2002). 
Testing this idea is experimentally feasible, for example, 
through targeting mutations in the maternal-to-paternal con- 
version region in the BnC6b homoeolog and comparing its 
accumulated SSU proteins in the plastid stroma with SSU 
proteins from control B. napus individuals. At present, we 
are reporting an intriguing phenomenon that is suggestive 
of a newly described dimension of cytonuclear evolution. 

Transcriptional Cytonuclear Coordination of 
rbcS Homoeologs 

In addition to the gene sequence data and gene conversion 
evidence for cytonuclear accommodation to the polyploid 
state, we also explored gene expression levels to test whether 
there is biased expression of maternally derived rbcS genes. In 
three of the four allopolyploids (all but B. napus), relative to 
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the paternal homoeologs with no intergenomic conversion, 
paternal homoeologs with maternal-to-paternal conversions 
uniformly displayed preferential expression, consistent with 
our previous observations in Gossypium (Cong et al. 2012). 
This repeatedly observed, biased homoeolog expression 
among diverse allopolyploids is suggestive of selection at 
the level of transcript accumulation with a fitness advantage 
for SSU-encoding transcripts that carry maternal-like se- 
quences. We note that biased expression of the paternal 
homoeolog with no maternal conversions was observed in 
Brassica. It could be explained by two possible scenarios: 1) 
relative weak selection of maternal LSU in plastid, which is 
identical with paternal LSU; and 2) insufficient time for tran- 
scriptional selection to arise. 

Here, we have explored two dimensions of possible coor- 
dination and regulation of rubisco component subunits fol- 
lowing allopolyploidization in plant species. We have 
confirmed that concerted evolution among divergent ances- 
tral, duplicated copies of rbcS genes is a consistent feature of 
allopolyploid plants. We have shown that interparalog gene 
conversion is common at the diploid level and that it con- 
tinues among homoeologs at the allopolyploid level, with a 
preferential occurrence of maternal-to-paternal, intergeno- 
mic conversions in signaling and regulatory domain of SSU 
genes. In most allopolyploids, this is accompanied by biased 
expression of paternal homoeologs carrying maternal-like 
gene conversions. Taken together, these data are consistent 
with cytonuclear selection following the reunion of two di- 
verged genomes in a single cytoplasm as a consequence of 
allopolyploid speciation. Importantly, our analysis focuses 
only on cytonuclear coevolution of rubisco genes at the 
DNA and RNA levels; clearly much work remains for other 
potentially relevant dimensions of the problem, including 
studies of incorporation efficiency of divergent 
homoeologous SSUs into the rubisco holoenzyme, similar ex- 
plorations in other cytonuclear coencoded complexes 
assembled in cytoplasmic organelles, stoichiometric changes 
in organelle and organellar genome abundances in each 
polyploid cell compared with the cells of their diploid parents, 
and many other dimensions of protein trafficking into 
organelles. 

Materials and Methods 

DNA and RNA Extraction and cDNA Synthesis 
Four angiosperm polyploid lineages were selected, each of 
which included model progenitor diploids and derived allo- 
polyploids (table 1). Fully expanded leaves of each species in 
each genus were sampled at the same developmental stages. 
After washing with Diethylpyrocarbonate (DEPC)-treated 
water, leaves were divided into two parts, which were used 
for DNA and RNA extraction, respectively. DNA extraction, 
RNA extraction, and cDNA synthesis were carried out follow- 
ing methods described previously (Gong et al. 2012). 

Primer Design, Cloning, and Sequencing 

rbcL is highly conserved among closed related species (Gielly 

and Taberlet 1994). We downloaded from National Center for 



Biotechnology Information (NCBI) all available rbcL genes in 
the genera studied for primer design (supplementary table S1, 
Supplementary Material online). For lineages not represented 
in the NCBI collection, the rbcL gene sequence from a closely 
related genus was used as the query sequence to BLASTn 
against the expressed sequence tags (ESTs) in PlantGDB 
(http://www.plantgdb.org/, last accessed April 28, 2014; 
supplementary table SI, Supplementary Material online). 
Manually aligned sequences of ESTs on the 5'- and 3'-end 
of the original BLAST query sequence (covering the start and 
stop codon, respectively) were used for primer design. 
Degenerate primers used to amplify full-length rbcL genes 
in each species are tabulated in supplementary table SI, 
Supplementary Material online. 

Available genome assemblies of sequenced species and 
their ESTs deposited in PlantGDB were collected for rbcS 
primer design (supplementary table S2, Supplementary 
Material online). The rbcS genomic sequences of that 
species, or a related species in the same genus, or in some 
cases a related genus, were used as query sequences with 
BLASTn against genome assemblies or EST sequences. 
Significant homologous copies in each genome as- 
sembly or the manually aligned sequences of ESTs on the 
5'- and 3'-end of the original BLAST query sequence were 
used for primer design. Primers specifically amplify- 
ing orthologs or homoeologs of rbcS in each species are tab- 
ulated in supplementary table S2, Supplementary Material 
online. 

Polymerase chain reaction (PCRs) and PCR programs 
amplifying rbcL and rbcS genes, PCR product cloning, and 
sequencing followed the methods described earlier (Gong 
et al. 2012). Only the annealing temperature (at the initial 
step and in stabilized loops) was adjusted for each 
primer (supplementary table S2, Supplementary Material 
online). The same sets of primers designed above for 
amplifying genomic rbcS genes were also used for 
amplification of rbcS cDNAs. Final sequenced rbcL and rbcS 
genes were deposited into GenBank with accession numbers 
as KM025240-KM025251 and KM025252-KM025337, 
respectively. 

When amplifying the rbcS genomic or transcript copies in 
each species, to avoid possible false rbcS PCR-recombination 
artifacts, three parallel independent PCRs were carried out for 
each primer sample. Only the rbcS copies, having at least 25% 
supportive clones sequenced in each independent PCR, were 
accepted as bona fide copies. Cloning and sequencing of the 
PCR products were also carried out as described (Gong et al. 
2012). 

Sequence Alignment and Phylogenetic Reconstruction 
Sequences were aligned within each genus using the online 
MAFFT tool V7.122 (Katoh and Standley 2013). After manual 
adjustment, synonymous and nonsynonymous substitutions 
in the exons were noted. Phylogenetic histories of the rbcS 
multigene family in all diploids of each lineage were inferred 
based on parsimony analysis. 
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Detection of Homoeologous SNPs and Gene 
Conversion Events (Nonreciprocal Homoeologous 
Recombination) 

For each genus, genome-diagnostic SNPs (including the spe- 
cies-specific and genome-unique SNPs) and autapomorphic 
SNPs were inferred in rbcS orthologs and homoeologs, respec- 
tively. Genome-diagnostic SNPs were used to determine the 
parental genomic origin of each homoeolog. Autapomorphic 
SNPs are defined as novel nucleotides arising at the polyploid 
level (in either homoeolog). Within the allopolyploid species, 
the possible exonic rbcS genomic conversion regions or points 
of "non-reciprocal recombination" (in only one direction 
from paternal to maternal homoeolog or vice versa; Salmon 
et al. 2010) were initially inferred using the GENECONV tool 
(automated recombination detection in triplet sequences), 
which is incorporated in RDP4 Beta 4.27 software (Sawyer 
1989; Martin et al. 2010). Specifically, each rbcS homoeolog 
in the polyploids was searched against both reference diploid 
orthologs and other homoeologs: Any recombinations iden- 
tified between homoeologs of the same genomic origin were 
inferred as intragenomic conversions, whereas those involving 
homoeologs of different genomic origin were accepted as 
products of intergenomic conversion events. Recombination 
detection program (RDP)-identified conversion copies were 
further processed by homemade Perl scripts, which tabulated 
the SNP information within converted homoeologs, as previ- 
ously described (listing the coordinates in the alignment and 
nucleotide changes before and after the conversions; Gong 
et al. 2012). As noted previously, to avoid possible artificial 
PCR recombinants, only recombinants occurring in at least 
25% of the total cloned sequences from each replicated PCR 
were accepted as true "gene conversion" copies. 

Statistical Comparison of rbcS Homoeolog Transcript 
Level Based on RNAseq 

Next-generation RNA sequencing data of all polyploids 
in four lineages were collected from SRA databases in NCBI 
and other resources (supplementary table S3, Supplementary 
Material online). Quality-filtered reads were mapped to all 
cloned rbcS homoeologs via Bowtie 1.0.0 with stringent per- 
fect match control (Langmead et al. 2009). The final rbcS 
homoeolog-specific expression proportions were obtained 
by dividing the mapped reads covering all diagnostic homo- 
eolog-specific SNPs of each homoeolog copy by the total 
reads mapped to those SNP positions in all expressed rbcS 
homoeologs. The coverage of each SNP in each homoeolog 
was obtained by running the mpileup module in the 
samtools package (Heng et al. 2009). The final observed 
rbcS homoeolog-specific expressions were obtained by mul- 
tiplying their individual estimated expression proportion by 
the total mapped reads. 

Given the high expression levels of rbcS genes in plant 
species, under the Central Limit Theorem (Rice 2006), a Z 
statistic evaluating the expression difference between homo- 
eologs with intergenomic gene conversion and paralogous 
homoeologs without intergenomic gene conversions (here 



abbreviated as "H converted vs. H 
as follows: 



nonconversion 



"), was calculated 



(1) The null hypothesis assumed no homoeolog expression 
difference in H converted versus H nonconversion . Hence, the 
expectation of the expression difference was zero. 
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(2) The variance of the homoeolog expression difference in 
Hconverted versus H nonconversion was derived: 
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Under the assumption of the summed proportions of all 
rbcS homoeologs being 1, the probability of obtaining the 
observed combination of rbcS homoeolog expression 
should follow the multinomial distribution. 
According to the known variance of one variable and 
covariance of two component variables in the multino- 
mial distribution, 
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in which N tota | was the total expression of all rbcS 
homoeologs in polyploid species, and PHconverted ar| d 
PHnonconversion were expression proportions of homoeo- 
logs with and without intergenomic conversions. 
(3) A final Z statistic was calculated with all terms replaced 
by their values calculated as above: 
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P value of each estimated Z statistic was estimated based 
on the standard normal distribution. 

Supplementary Material 

Supplementary figures S1-S8 and tables S1-S4 are available at 
Molecular Biology and Evolution online (http://www.mbe. 
oxfordjournals.org/). 
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